ProViDE: A software tool for accurate estimation of viral diversity in metagenomic samples

نویسندگان

  • Tarini Shankar Ghosh
  • Monzoorul Haque Mohammed
  • Dinakar Komanduri
  • Sharmila Shekhar Mande
چکیده

Given the absence of universal marker genes in the viral kingdom, researchers typically use BLAST (with stringent E-values) for taxonomic classification of viral metagenomic sequences. Since majority of metagenomic sequences originate from hitherto unknown viral groups, using stringent e-values results in most sequences remaining unclassified. Furthermore, using less stringent e-values results in a high number of incorrect taxonomic assignments. The SOrt-ITEMS algorithm provides an approach to address the above issues. Based on alignment parameters, SOrt-ITEMS follows an elaborate work-flow for assigning reads originating from hitherto unknown archaeal/bacterial genomes. In SOrt-ITEMS, alignment parameter thresholds were generated by observing patterns of sequence divergence within and across various taxonomic groups belonging to bacterial and archaeal kingdoms. However, many taxonomic groups within the viral kingdom lack a typical Linnean-like taxonomic hierarchy. In this paper, we present ProViDE (Program for Viral Diversity Estimation), an algorithm that uses a customized set of alignment parameter thresholds, specifically suited for viral metagenomic sequences. These thresholds capture the pattern of sequence divergence and the non-uniform taxonomic hierarchy observed within/across various taxonomic groups of the viral kingdom. Validation results indicate that the percentage of 'correct' assignments by ProViDE is around 1.7 to 3 times higher than that by the widely used similarity based method MEGAN. The misclassification rate of ProViDE is around 3 to 19% (as compared to 5 to 42% by MEGAN) indicating significantly better assignment accuracy. ProViDE software and a supplementary file (containing supplementary figures and tables referred to in this article) is available for download from http://metagenomics.atc.tcs.com/binning/ProViDE/

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DASTWAR: a tool for completeness estimation in magnitude-size plane

Today, great observatories around the world, devote a substantial amount of observing time to sky surveys. The resulted images are inputs of source finder modules. These modules search for the target objects and provide us with source catalogues. We sought to quantify the ability of detection tools in recovering faint galaxies regularly encountered in deep surveys. Our approach was based on com...

متن کامل

Searching more genomic sequence with less memory for fast and accurate metagenomic profiling

Software for rapid, accurate, and comprehensive microbial profiling of metagenomic sequence data on a desktop will play an important role in large scale clinical use of metagenomic data. Here we describe LMAT-ML (Livermore Metagenomics Analysis Toolkit-Marker Library) which can be run with 24 GB of DRAM memory, an amount available on many clusters, or with 16 GB DRAM plus a 24 GB low cost commo...

متن کامل

PhD dissertation defense: METAGENOMIC INSIGHTS INTO MICROBIAL DIVERSITY AND RESISTANCE TO ANTIBIOTICS IN WASTEWATER TREATMENT PLANTS

Our water environment is greatly impacted by the presence of microbial contaminants which is of great concern it terms of public health exposure. Full-scale conventional and state-of-the-art wastewater utilities have been found to release pathogens and resistant bacteria in the environment. Management and minimization of microbial pathogens and antibiotic resistant bacteria in wastewater treatm...

متن کامل

Accurate Genome Relative Abundance Estimation Based on Shotgun Metagenomic Reads

Accurate estimation of microbial community composition based on metagenomic sequencing data is fundamental for subsequent metagenomics analysis. Prevalent estimation methods are mainly based on directly summarizing alignment results or its variants; often result in biased and/or unstable estimates. We have developed a unified probabilistic framework (named GRAMMy) by explicitly modeling read as...

متن کامل

Present an Initial Estimation Method for Logical Transaction-based Software Projects

The first and most basic requirement for successful entry of a project, is have a realistic and reasonable estimation. In this paper, in order to increase accuracy of software projects estimation and reduce complexity of estimation process, we introduce a method called the "Logical Transaction Point (LTP)". Our method is most appropriate for transactional software. By use of this method can est...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2011